Skip to content

C++: Support reasoning about whether a phi node overwrites the entire buffer#21836

Open
MathiasVP wants to merge 13 commits into
github:mainfrom
MathiasVP:uncertain-def-more-complete
Open

C++: Support reasoning about whether a phi node overwrites the entire buffer#21836
MathiasVP wants to merge 13 commits into
github:mainfrom
MathiasVP:uncertain-def-more-complete

Conversation

@MathiasVP
Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP commented May 12, 2026

This PR is a combination of two fixes with the overarching theme of computing which SSA definitions are certain (which, in this context, means "overwrites the entire buffer").

A brief reminder of what we use "certain" vs. "uncertain" SSA writes for: In order to model flow through something like

p[i] = source();
p[j] = 0;
sink(p[k]);

(without knowing the concrete values for i, j, and k) we model the write to p[i] and p[j] as "uncertain" writes. That is, the write to the underlying SSA variable isn't overwritten. So whatever value was previously tracked by SSA is not killed by the SSA definition.

  • The first improvement is in 8585bb6 which changes some of the SSA writes to be certain (whereas before they were only certain if the address they were writing to didn't involve pointer arithmetic).

  • The next change is slightly less well-motivated. This change is most likely necessary if we ever want to switch the instantiation of the shared range analysis library over from the sound IR-based SSA to dataflow-based SSA.

    On main we currently consider any definition from a PhiNode to be an uncertain definition. This PR changes this so that we consider a PhiNode to be a certain definition whenever its inputs are all certain definitions.

    There are some technical things that make this slightly more complicated than what you would expect: Since a PhNode can be its own input we need recursion over a graph with cycles, and because of QL's least fixed-point semantics we don't get the right result when doing recursion over a graph with cycles. So we instead recurse over a graph whose nodes are the the recursive PhiNodes.

    As always, recursion through forall is also dangerous and should ideally be rewritten using explicit recursion over integers to reduce the risk of performance problems. However, I'll delay that until we actually see a performance problem on a database.

    (I expect this work to be necessary for switching the instantiation of the shared range analysis library over to dataflow-based SSA because I expect the relevant SSA variables for range analysis to be those for which any SSA definition is a certain SSA definition.)

DCA shows a couple of lost results. They're all FPs of the form added in 8e25240 (which this PR fixes) 🎉

@github-actions github-actions Bot added the C++ label May 12, 2026
@MathiasVP MathiasVP added no-change-note-required This PR does not need a change note and removed C++ labels May 12, 2026
@github-actions github-actions Bot added the C++ label May 12, 2026
@MathiasVP MathiasVP force-pushed the uncertain-def-more-complete branch from 0df8ed9 to 07b8d7e Compare May 13, 2026 12:15
@MathiasVP MathiasVP marked this pull request as ready for review May 13, 2026 12:22
@MathiasVP MathiasVP requested a review from a team as a code owner May 13, 2026 12:22
Copilot AI review requested due to automatic review settings May 13, 2026 12:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the C++ IR-based SSA/dataflow implementation to better classify SSA definitions as “certain” (overwrites the entire buffer), including propagating certainty through PhiNodes, and adjusts/extends tests to cover the new behavior.

Changes:

  • Refines write-certainty tracking by replacing a boolean “certain” flag with a Certainty type and updating write handling accordingly.
  • Changes PhiNode certainty so phi definitions can be “certain” when their inputs are certain, including SCC-based handling for phi cycles.
  • Updates existing dataflow test expectations and adds a new inline-expectations test suite for “certain/uncertain” SSA definitions.
Show a summary per file
File Description
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImplCommon.qll Introduces Certainty-typed write certainty and updates write classification logic.
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImpl.qll Adjusts SSA source-variable stringification and implements phi-node certainty propagation (incl. cycle handling).
cpp/ql/test/library-tests/dataflow/dataflow-tests/test.cpp Adds a regression test case exercising behavior around certainty/uninitialized flow in a loop.
cpp/ql/test/library-tests/dataflow/certain/test.ql Adds an inline expectations harness to assert Ssa::Definition.isCertain() outcomes.
cpp/ql/test/library-tests/dataflow/certain/test.cpp Adds C++ scenarios to validate “certain vs uncertain” SSA defs and phi behavior.
cpp/ql/test/library-tests/dataflow/certain/test.expected Empty expected file for the new inline expectations test.
cpp/ql/test/library-tests/dataflow/dataflow-tests/uninitialized.expected Updates expected results for uninitialized/dataflow tests.
cpp/ql/test/library-tests/dataflow/dataflow-tests/test-source-sink.expected Updates expected source-to-sink flow results.
cpp/ql/test/library-tests/dataflow/dataflow-tests/localFlow-ir.expected Updates expected IR local flow output (notably SSA phi read formatting).
cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-consistency.expected Updates expected dataflow consistency outputs.
cpp/ql/test/experimental/query-tests/Security/CWE/CWE-193/constant-size/ConstantSizeArrayOffByOne.expected Updates experimental query-test expectations to match new flow behavior.

Copilot's findings

  • Files reviewed: 10/11 changed files
  • Comments generated: 3

Comment thread cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImplCommon.qll
Comment thread cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImpl.qll Outdated
Comment thread cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImpl.qll
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@aschackmull
Copy link
Copy Markdown
Contributor

  • On main we currently consider any definition from a PhiNode to be an uncertain definition. This PR changes this so that we consider a PhiNode to be a certain definition whenever its inputs are all certain definitions.

Please elaborate on this. As-is this makes no sense to me, since I would not consider "certainness" to be a meaningful property of phi nodes.

Comment on lines +139 to +141
if ind = 0
then result = "&" + base.toString()
else result = repeatStars(this.getIndirection() - 1) + base.toString()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing direct field access ind with this.getIndirection() becomes very confusing to read. Same issue below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d93de54

Comment on lines +1384 to +1388
private predicate hasAnInput(PhiNode phi1, PhiNode phi2) {
definitionCycle(phi1) and
definitionCycle(phi2) and
getAnInput(phi1) = phi2
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. You're attempting to define SCC-internal edges here, but if two SCCs are adjacent then this will accidentally merge them. You need this instead:

private predicate sccEdge(PhiNode phi1, PhiNode phi2) {
  getAnInput(phi1) = phi2 and getAnInput+(phi2) = phi1
}

Copy link
Copy Markdown
Contributor Author

@MathiasVP MathiasVP May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yes I can see how these two definitions aren't equivalent and that yours captures what I intended to capture. Now I'm just trying to construct an example where the two definitions would cause a change in a test 🤔

If you have one at hand I'd love to know. Otherwise I'll spend some time trying to construct one, and if all else fail I'll just replace the code with your suggestion as it's obviously the right one

Copy link
Copy Markdown
Contributor

@aschackmull aschackmull May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this perhaps:

while (..) { // phi_1
  if (..) x = ..
  // phi_2
}
while (..) { // phi_3
  if (..) x = ..
  // phi_4
}

Here phi_1 + phi_2 ought to form one SCC and phi_3 + phi_4 another, and your code would conflate them. I think if you make the assignment in the first loop certain, and the one in the second uncertain then you should end up with an observable difference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That indeed did the trick. f77d426 adds the test and f5113b1 fixes it using your definition. Thanks!

Comment on lines +1403 to +1405
forex(PhiNode phi | phi = this.getAPhiNode() |
forall(PhiNode inp | phi.getAnInput() = inp and not this.hasPhiNode(inp) | inp.isCertain())
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factor out the range to make the recursion simpler. Add a member predicate

pragma[nomagic]
Definition getAnInput() {
  result = this.getAPhiNode().getAnInput() and not this.hasPhiNode(result)
}

and use this to define the recursive forall:

forex(Definition inp | this.getAnInput() | inp.isCertain())

Note also that I've changed the type of inp from PhiNode to Definition - I believe that was also wrong in your version.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c6ce13a

@MathiasVP
Copy link
Copy Markdown
Contributor Author

Thanks for the comments, Schack!

Please elaborate on this. As-is this makes no sense to me, since I would not consider "certainness" to be a meaningful property of phi nodes.

Forget the "certainness" naming for the moment and just consider it to be a property of whether an SSA definition overwrites the entire buffer pointed to. For example:

void f(int* p) {
  if(b) {
    *(p + 1) = 1; // (1)
  } else {
    *(p + 1) = 2; // (2)
  }
  // (3)
}

we don't know how large the buffer pointed to by p is so the definitions at (1) and (2) are uncertain. Thus, we don't know whether the phi for *p at (3) overwrites the entire buffer pointed to by p.

Does that make things more clear?

@aschackmull
Copy link
Copy Markdown
Contributor

Does that make things more clear?

Sort of, yes. But I still think the concept is flawed. Firstly, a phi is not a write - and if you do think of it as a write in the sense x_3 = phi(x_1, x_2) then I'd say that it always overwrites the entire buffer by definition. In your example, I'd say that the phi at (3) is fully defined by its SSA inputs from (1) and (2), so I'd always consider it to be "certain". Now those two prior definitions are themselves uncertain, so they're in turn defined by a combination of the RHS and the prior SSA definition (the init at entry). So an uncertain write is sort of like a certain write plus a phi node all wrapped into one SSA def.

@MathiasVP
Copy link
Copy Markdown
Contributor Author

So an uncertain write is sort of like a certain write plus a phi node all wrapped into one SSA def.

That's also a perfectly good way one could model it, yes. In fact, I think that's exactly what is captured by a Chi instruction which we don't use in dataflow because it's based on the IR's way-too-imprecise-to-be-useful-in-dataflow memory SSA.

Anyway, obviously this is not a direction I think we should go in for this PR, but I appreciate the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants